Divide & Conquer-based Inclusion Dependency Discovery

نویسندگان

  • Thorsten Papenbrock
  • Sebastian Kruse
  • Jorge-Arnulfo Quiané-Ruiz
  • Felix Naumann
چکیده

The discovery of all inclusion dependencies (INDs) in a dataset is an important part of any data profiling effort. Apart from the detection of foreign key relationships, INDs can help to perform data integration, query optimization, integrity checking, or schema (re-)design. However, the detection of INDs gets harder as datasets become larger in terms of number of tuples as well as attributes. To this end, we propose Binder, an IND detection system that is capable of detecting both unary and n-ary INDs. It is based on a divide & conquer approach, which allows to handle very large datasets – an important property on the face of the ever increasing size of today’s data. In contrast to most related works, we do not rely on existing database functionality nor assume that inspected datasets fit into main memory. This renders Binder an efficient and scalable competitor. Our exhaustive experimental evaluation shows the high superiority of Binder over the state-of-the-art in both unary (Spider) and n-ary (Mind) IND discovery. Binder is up to 26x faster than Spider and more than 2500x faster than Mind.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Free Vibration Analysis of Repetitive Structures using Decomposition, and Divide-Conquer Methods

This paper consists of three sections. In the first section an efficient method is used for decomposition of the canonical matrices associated with repetitive structures. to this end, cylindrical coordinate system, as well as a special numbering scheme were employed. In the second section, divide and conquer method have been used for eigensolution of these structures, where the matrices are in ...

متن کامل

A Divide-and-Conquer Strategy for Parsing

In this paper, we propose a novel strategy which is designed to enhance the accuracy of the parser by simplifying complex sentences before parsing. This approach involves the separate parsing of the constituent sub-sentences within a complex sentence. To achieve that, the divide-and-conquer strategy first disam-biguates the roles of the link words in the sentence and segments the sentence based...

متن کامل

Knowledge Reduction Based on Divide and Conquer Method in Rough Set Theory

The divide and conquer method is a typical granular computing method using multiple levels of abstraction and granulations. So far, although some achievements based on divided and conquer method in the rough set theory have been acquired, the systematic methods for knowledge reduction based on divide and conquer method are still absent. In this paper, the knowledge reduction approaches based on...

متن کامل

Plan Mining by Divide-and-Conquer

Plans or sequences of actions are an important form of data With the proliferation of database technology plan databases or planbases are increasingly common E cient discovery of important patterns of actions in plan databases presents a challenge to data mining In this paper we present a method for mining signi cant patterns of successful actions in a large planbase using a divide and conquer ...

متن کامل

On Parity based Divide and Conquer Recursive Functions

The parity based divide and conquer recursion trees are introduced where the sizes of the tree do not grow monotonically as n grows. These non-monotonic recursive functions called fogk(n) and f̃ogk(n) are strictly less than linear, o(n) but greater than logarithm, Ω(logn). Properties of fogk(n) such as non-monotonicity, upper and lower bounds, etc. are examined and proven. These functions are us...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2015